home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Cream of the Crop 20
/
Cream of the Crop 20 (Terry Blount) (1996).iso
/
faq
/
mthrbds.zip
/
MTHRBDS.TXT
< prev
Wrap
Text File
|
1996-06-14
|
42KB
|
1,086 lines
rev 15 faq EIDE controller flaws part 1 of 2
From: roedy@BIX.com (Roedy Green)
Newsgroups: comp.os.os2.bugs
Subject: rev 15 faq EIDE controller flaws part 1 of 2
Date: 1 Sep 1995 01:08:35 GMT
Organization: Canadian Mind Products
Lines: 545
Message-ID: <425mej$hgo@news2.delphi.com>
NNTP-Posting-Host: bix.com
X-Newsreader: Galahad 1.1f
EIDE CONTROLLER FLAWS part 1 of 2
Revision 15: 1995 August 31
SUMMARY OF RECENT CHANGES
1) EIDEtest 1.5 and CDTest 1.0 released.
2) Yet another suspect EIDE controller chip: the SMC
37650.
3) Intel contradicts itself on the performance hit from
disabling prefetch to bypass the flaw.
4) Software from IBM and Intel to detect both faulty chips
directly.
5) The precise mechanism of failure for both the RZ-1000
and CMD 640B is now understood. The RZ-1000 and CMD 640B
both have the prefetch flaw. The CMD 640B has two additional
flaws.
6) Explanation of what "Intel Inside" means.
7) Dell offers upgrade BIOS to turn off the prefetch
buffers.
8) RZ-1000 flaw bypass for APAR PJ19409 for Warp now
available.
9) List of safe and unsafe operating system software.
10) IBM hardware is clean.
11) Stonewall rebuilds. Intel recants on offer to replace
defective motherboard.
12) Problem is showing up under Windows For WorkGroups in
32 bit mode.
13) Cleaning up past damage is very difficult.
14) Assigning blame.
15) The Triton chipset is immune. These chips are marked
with an FX suffix.
16) Windows-95, NT are immune.
17) DOS and Windows 3.1 are immune if you have an Intel
BIOS.
INTRODUCTION
There are serious flaws affecting about 1/3 of all PCI
motherboards. The flaws affect any motherboard or EIDE
controller paddleboard containing the PC-Tech RZ-1000 PCI
EIDE controller chip or the CMD PCIO 640B PCI EIDE
controller chip. There are preliminary reports of yet a
third flawed chip -- the SMC 37650.
The flaws affect motherboards from ASUSTeK, AT&T, Dell,
Gateway, Zeos and Intel. Since Intel makes so many of the
motherboards sold under other brand names, the flaws affect
many machines, both 486 and Pentium PCI.
The flaw shows up most frequently when you run a true
multitasking operating system such as OS/2 Warp. It also
shows up under Windows For WorkGroups in 32 bit mode during
tape or floppy backup and restore. In theory the flaw could
do damage under DOS, DESQview, Windows and Windows For
WorkGroups in 16 bit mode, but so far there have been no
damage reports. Recent versions of Microsoft NT and Windows-
95 contain code to bypass the flaw.
WHAT ARE THE SYMPTOMS?
When you are using an IDE or EIDE hard disk attached to the
EIDE motherboard port, the flaw subtly corrupts your files
by randomly changing bytes every once in a while. The flaw
introduces bugs into EXE files, subtle errors into your
spreadsheets, stray characters into your word processing
documents, changes to the deductions in last year's tax
return files, and random changes to engineering design
files.
This corruption happens when you are simultaneously using
your EIDE or IDE hard disk and some other device, most
commonly the floppy drive or mag tape backup.
The same sorts of problem may occur on reading a CD-ROM
drive attached to an EIDE port.
IS IT SERIOUS?
These flaws are nasty. They are causing hundreds of times
more havoc than the infamous Pentium divide flaw ever did.
"I am Pentium of Borg. You will be approximated."
Not only does this corruption occur, but it occurs quietly,
often going unnoticed.
If the system crashes, you usually put the blame on the
operating system software, or the application. It might
actually be a faulty RZ-1000 or CMD 640B EIDE controller
chip nailing you.
When a directory becomes corrupted, you may not notice it
until the damage is irreparable. If a spreadsheet
application reads a comma-delimited ASCII file, it may
simply miss a few bytes in a number, an error that may go
unnoticed, and that error could cascade through the rest of
the spreadsheet.
If you have had unexplained crashes in OS/2, you have
probably experienced the problem, and should make a thorough
check for hidden corruption. Remember that the bug may only
slightly alter your data, and the corruption may not be
obvious.
Keep in mind that not every problem is the RZ-1000's or the
CMD 640B's fault. Overheating, unrelated hardware faults and
design flaws, or software bugs can cause similar symptoms.
DMA channel conflicts also cause similar symptoms. Happily,
EIDEtest and CDTest can unmask all manner of simultaneous
I/O faults.
Unfortunately, correcting the problem just stops further
file corruption. It will help to clean up the existing
damage to your files. Right now, the focus is on bypassing
the flaw. Preventing further corruption is child's play
compared with the nightmare of trying to track down all the
existing random errors in files. Backups even from day one
may be corrupted. If you have the flaw, you will probably
never be able to completely eliminate the effects of past
corruption.
HOW DO YOU TELL IF YOU HAVE THE FLAW?
There are four categories of motherboard:
1) Definitely safe. Motherboards may still have the flaw,
but all software in use bypasses it.
2) Probably safe. In theory there could be problems, but
no one has reported any so far.
3) Possibly dangerous. You will have to run EIDEtest,
CDtest, or IOTest to find out.
4) Probably dangerous. You will still have to run the
tests to find out for sure.
Definitely Safe
Definitely safe includes older machines with ISA. EISA, VESA
VL or MCA buses. The flaw only affects machines with the new
PCI bus. PCI machines that use the new Triton chipset from
Intel do not have the flaw.
PCI machines with Intel BIOSes that run only DOS, DESQview,
Windows 3.1, Windows-95 or NT 3.5 are safe. If you have a
non-Intel BIOS and run only DOS, DESQview, Windows 3.1,
Windows-95 or NT 3.5 and never use the "fast mode"
simultaneous disk I/O feature on floppy or tape
backup/restore, you are safe.
You still might want to test your machine. There are similar
problems with other causes the tests will unmask.
Probably Safe
If you have a non-Intel BIOS and run only DOS, DESQview,
Windows 3.1, or Word For Windows in 16-bit disk access mode,
you probably will not see the problem, even though you may
have one of the faulty chips.
Possibly Dangerous
Most auxiliary chipsets (e.g., OPTI Viper, SMC, Mercury and
Neptune) used on PCI motherboards do not include a built in
EIDE controller. Such motherboards use a separate EIDE
controller chip -- often the flawed RZ-1000 or CMD 640B. If
you use a separate EIDE paddleboard, it will likely use the
one of the flawed chips. In theory, the flaw could affect
DOS, Windows, and Windows For WorkGroups with 16 bit disk
access during floppy/tape backup and restore, though no one
has reported problems yet. Windows For WorkGroups with 32
bit disk access is dangerous if you have the flaw.
Probably Dangerous
PCI Motherboards (both 486 and Pentium) with the older
Mercury and Neptune chipsets are likely to have the flaw.
The Mercury chipset was popular in P60 and P66 systems, and
the Neptune in P70, P90 and P100 systems. Mercury chipsets
are labelled with an MX suffix and Neptune with NX. If you
are using NT 3.1, OS/2 Warp or Linux, you are likely to have
already experienced extensive file corruption if the flaw is
present.
TESTING FOR THE FLAW
Scot Llewelyn, one of the eight authors of
PowerQuest's PartitionMagic, discovered the RZ-1000 flaw and
made it public. Prior to that, only employees of PC-Tech,
Intel and Microsoft were aware of how to bypass the flaw. In
the process of tracking the RZ-1000 problem down, Internet
comp.os.os2.bugs participants discovered a second flawed
chip, the flawed CMD 640B, and are now suspicious about the
SMC 37650.
Scot did most of the initial work documenting the RZ-1000
flaw. He wrote a program called IOte